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Abstract 

We consider exact recovery of any m x n matrix of rank g from a small number of ob¬ 
served entries via the nuclear norm minimization in 0- Such matrices have degrees of 
freedom (m + n)g — g 2 . We show that any low-rank matrix can be recovered exactly from 
O (((m + np— g 2 )log 2 (m + n)) randomly sampled entries, thus matching the lower bound 
on the required number of entries (in degrees of freedom), with an additional factor of 
0(\og 2 (m + n)). To achieve this bound we observe each entry with probabilities propor¬ 
tional to the sum of corresponding row and column leverage scores, minus their product (see 
0). We show that this relaxation in sampling probabilities (as opposed to sum of leverage 
scores in 0) can give us 0(p 2 log 2 (?u + n)) improvement on the (best known) sample size 
obtained by [5] for the problem in (jT]). Experiments on real data corroborate the theoretical 
improvement on sample size. 

Further, exact recovery of (a) incoherent matrices (with restricted leverage scores), and (b) 
matrices with only one of the row or column spaces to be incoherent, can be performed using 
our relaxed leverage score sampling , via m, without knowing the leverage scores a priori. In 
such settings also we can achieve improvement on sample size. 


1 Introduction 

Suppose we have a data matrix M € R mxn with incomplete/missing entries, say, we have informa¬ 
tion about only a small number elements of M. The matrix completion problem (j2j) is to predict 
those missing entries as accurately as possible based on the observed entries. Such partially- 
observed data may appear in many application domains. For example, in a user-recommendation 
system (a.k.a collaborative filtering) we have incomplete user ratings for various products, and 
the goal is to make predictions about a user’s preferences for all the products (e.g., the Netflix 
problem ). Also, the incomplete data could represent partial distance matrix in a sensor network, 
or missing pixels in digital images because of occlusion or tracking failures in a video surveillance 
system (0). 

More mathematically, we have information about the entries My, (i, j) € f2, where C 
[m] x [n] is a sampled subset of all entries, and [n] denotes the list {1, ...,n}. The problem is to 
recover the unknown matrix M in a computationally tractable way from as few observed entries 
as possible. However, without further assumption on M it is impossible to predict the unobserved 
elements from a limited number of known entries. One popular assumption is that M has low- 
rank, say rank g. Such matrices have degrees of freedom (m + n)g — g 2 , i.e., this many parameters 
control all other entries. This implies, if the number of observed entries s = |fl| < (m + n)g — g 2 , 
there can be infinitely many matrices of rank at most g with exactly the same entries in hi; 
therefore, exact recovery of unobserved entries is impossible. So, in general, we need at least 
(m + n)g — g 2 many observed entries for exact matrix completion. The matrix M, with the 
observed entries, can be interpreted as an element in mn-dimensional linear space, with available 
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information about 0((m+n)g— g 2 ) coordinates. The set of matrices compatible with the observed 
entries forms a large affine space. Then, exact matrix completion problem is to specify an efficient 
algorithm which uniquely picks M from this high-dimensional affine space Q9J). Since our target 
matrix M is low-rank, a natural optimization problem for finding M would be to find a matrix 
with minimum rank complying with the observed entries. However, minimizing rank over an 
affine space is known to be NP-hard my m proposed to solve the heuristic optimization in £[]) 
(surrogate for rank minimization, [7]) to recover the low-rank matrix M. 

min IIXII. s.t. Xj,- = (i,j) G H, (1) 

xeR mx " 

where the nuclear norm ||X|| # of a matrix X is defined as the sum of its singular values, 
iixii* = £i*i(x). m is a convex optimization problem that is efficiently solvable via semi- 
definite programming. Exact matrix completion thus becomes proving that the nuclear norm 
restricted to the affine space has a strict and global minima at M. That is, if M + Z ^ M is a 
matrix in the affine space in JT]), we need to show ||M + Z||* > l|M||*. [2], 0, H3I, d developed 
the sufficient conditions and probabilistic tools to recover M as a unique solution to (fTj). 

One natural question is: which elements of M should we observe in (JTJ) , i.e., how should we 
construct the sample set fi? We want to define some probabilities on the entries of M. Most of 
the existing work focused on the case when in (|T]) is constructed by observing the entries of 
M uniformly randomly (|2l [9l f!3l 1J). However, this data-oblivious sampling scheme has a cost. 
If the matrix is very sparse, it cannot be recovered using uniform sampling of its entries, unless 
we observe almost all the entries. This is because by observing only zeros it is impossible to 
predict non-zeros of a matrix. This suggests that M cannot be in the null space of the sampling 
operator (to be defined later) extracting the values of a subset of the entries. Matrices similar to 
the above example can be characterized by the structure of their singular vectors. The singular 
vectors are (closely) ‘aligned’ with the standard basis. Therefore, the components of singular 
vectors should be sufficiently spread to reduce the number of observations needed to recover a 
low-rank matrix m)- Such restrictions on the row and column spaces of a low-rank matrix are 
called the incoherence assumptions (to be defined later). |9} HBj showed that such restricted class 
ofnxn matrices of rank g can be recovered exactly, with high probability, by observing as small 
as 0(ng log 2 n) entries sampled uniformly. Very recently, [5] proposed non-uniform probabilities 
proportional to the sum of row and column leverage scores of M to observe its entries ( leveraged 
sampling). They eliminated the need for those ‘incoherence’ assumptions, and showed that any 
arbitrary n x n matrix of rank g can be recovered exactly, with high probability, from as few as 
0{ng log 2 n) observed elements. 

Similar to [5], we also incorporate the row and column leverage scores of the recovering matrix 
M into our proposed probability of observing an entry. However, we use a relaxed notion of 
leverage score sampling. Specifically, we propose to observe an entry with probability proportional 
to the sum of the corresponding row and column leverage scores, minus their product. Theorem 
□ shows that observing entries according to this relaxed leverage score sampling in <[3|) , we can 
recover any arbitrary m x n matrix of rank-£ exactly, with high probability, from 0(((m + n)g — 
£ 2 )log 2 (m + n)) observed entries, via flU. This bound on the sample size is optimal (up to 
log 2 (?n + n) factor) in the number of degrees of freedom of a rank-£ matrix. Also, this can give 
us 0(g 2 log 2 (n )) improvement on the sample size in [5] for n x n case. 

For an n X n matrix M of rank-£ whose column space is incoherent and row space is arbitrarily 
coherent, [5| gives a provable sampling scheme (using leveraged sampling) which requires no prior 
knowledge of the leverage scores of M. They show that such M can be recovered exactly, with 
high probability, using sample size Q(ng log 2 n). We can incorporate our relaxed leverage scores 
in such setting, with no prior knowledge of leverage scores, to achieve improvement on the sample 
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Table 1: Summary of bound on sample size s for exact recovery of matrix M G M mxn of rank g 


matrix type 

probabilities 

bound on s 

citation 

incoherent 

uniform 

s > 0 (t ■ (m A n)^log(m A n)) 

m 

incoherent 

uniform 

s > 0(max{pf, p 0 }(\ + g z )log z (2n)) 

m 

any 

leveraged 

E[s] > 0(( A + g z )log z (m + n)) 

m 

any 

relaxed leverage 

E[s] < 0( A ■ log z (m + n)) 

Theorem [T| 


\ = (m + n— q)q, m A n = max{m, n}, r = max{/rf, p\j 2 po(m A n) 1 / 4 } 


size obtained by [5] while recovering M G M mxn exactly with high probability. Finally, our notion 
of relaxation in sampling probabilities can also achieve an improvement on the sample size of [5j 
even in case of uniform sampling for incoherent matrices. Tabic [T] summarizes some of the recent 
results in the literature. 

1.1 Notations and preliminaries 

[n] denotes natural number {l,...,n}. Natural logarithm of x is denoted by log(x). Matrices 
are bold uppercase, vectors are bold lowercase, and scalars are not bold. We denote the (z,j)-th 
entry of a matrix X by Xy. e* denotes the z-th standard basis vector whose dimension should be 
clear from the context. X 7 and x 7 denote the transpose of matrix X and vector x, respectively. 
Tr(X) denotes the trace of a square matrix X. 

Spectral norm of X is denoted by ||X|| 2 - The inner product between two matrices is (X, Y) = 
Tr(X 7 Y). Frobenius norm X is denoted by ||X||^, and ||X|| F = \J (X, X). The maximum entry 
of X is denoted by HX^ = maxjj |Xy |. For vectors Euclidean £2 norm is denoted by ||x|| 2 . 

Linear operators acting on matrices are denoted by calligraphic letters. The spectral norm 
(largest singular value) of such operator A will be denoted by ||*4|| op = sup x ||.A(X)||_p / ||X|| F . 
Also, we denote f(n) = Q(g(n)) when aq • g{n) < f(n) < «2 ■ g(n), for some positive universal 
constants oq, 0 . 2 - 

2 Main Results 

Our focus is to define a probabilities on the entries of M (i.e., to construct the sample set Q in 
(|TJ) ) to reduce the sample size, such that M becomes the unique optimal solution to H]). Here 
we use the Bernoulli sampling model (P]), where each entry (i.j) is observed independently with 
some probability p tJ . Before we state our main result and the distribution, we first need to define 
the normalized leverage scores (Pi 13, [5])- 


Definition 1 Let M G M mxn be of rank g with SVD M = USV 7 , where U and V are the 
left and right singular matrices, respectively, and X is the diagonal matrix of singular values. 
Normalized leverage scores for i-th row (denoted by /ii) and j-th column (denoted by nj) are 
defined as follows: 

Pi = (m/g)\\\J T ei\\ 2 2, V* G [m], 

Vj = (n/g)\\N T ej\\l, Vj G [n] (2) 


3 












Normalized leverage scores □ are non-negative, and they depend on the structure of row and 
column spaces of the matrix. Also, we have = = Si because U and V have 

orthonormal columns. We state our main result. 

Theorem 1 Let M G W nxn of rank g. Suppose, we have a subset of observed entries f l C 
[m] x [n], where each entry ( i,j) is observed independently with probability pij, such that, 

Pij = max {min {ciLplog 1 2 (m + n), l} , (mu)" 5 } (3) 

where La = ^ + vr ~ ■ wr an d c\ > 0 is some universal constant. Then, M is the unique opti- 

•J TTb lb lib Tb 

mal solution to m with probability at least 1 — 33 log(m+?r)(m+n) 3 c , for sufficiently large c > 3. 
Moreover, if the number of observed entries, according to (G|), is 0 (((m + n)g — £> 2 )log 2 (m + n)) , 
then, M is the unique optimal solution to {IJj with probability at least 1 — 66 log(m-|-n)(m-|-n) 3 ~ c , 
for sufficiently large c > 3. 

Row and column leverage scores measure the contribution of a row or column to the low-rank 
subspace (El HI]). Probabilities in © are biased towards the leverage score structure of the 
recovering matrix. This suggests that the elements in important rows and columns, indicated by 
high leverage scores {pi} and {zx,}, of a matrix should be observed more frequently in order to 
reduce the number of observations needed for exact matrix completion. [5j also noticed this, and 
they proposed to sum up ^ and pp in the sampling probabilities. However, our probabilities 
in © reduce this bias by subtracting the term LiS. . ILL while maintaining the leverage score 
pattern in pij. This relaxation in probabilities can help us to reduce the number of observations 
comparing to [5], in additive sense, to recover the low-rank matrix exactly, via ©. 

One simple intuition behind this relaxation comes from basic set theory. Let {ui} and {uj} be 
the indicators of row and column leverage scores of the recovering matrix, respectively, where the 
probabilities are p(ui) = an d p(vj) = pp. Then, we want to sample indices (i,j) according to 
row or column leverage scores, i.e., sampling probabilities p^ to be proportional to p(ui V Vj) = 
p(ui) + p(yj) — p(ui A Vj). Now, Ui and Vj are independent quantities (an element with high row 
leverage may or may not have high column leverage, and vice versa). Thus, p(ui/\Vj) = p(ui)-p{vj ) 
and p^ oc Lp- in © . 

A practical implication of such relaxation could be as follows. p l3 oc Lp = ^p + (1 — -fp)^- 
Note that, 0 < —, — < 1. When j-tli column is important, i.e., — is high (say close to 1) we 
want to observe all the elements along that column as Lp « pp. This eliminates the need for 
row leverage information of the elements of a column with high leverage score (and vice versa). 
This reduction of information for exact completion can lead to a smaller sample set in ([T|). 

As discussed earlier, we need a minimum of 0((m + n)g — g 2 ) elements to recover a matrix 
exactly, regardless of the choice of probabilities. Theorem [1] proves that if we observe elements 
according to our relaxed leverage scores , we match this lower bound, up to a factor of 0(log 2 (m + 
n)). Here is a comparison of our bound with that of [5]. 1) We provide a proof for general m x n 
matrix of rank-£ (as opposed to n x n case of [5]), 2) our form of probabilities establishes an upper 
bound on the expected sample size E[s]. To see this, note that Ylij Lij = (m + n)g — g 2 , and 
E[s] = Ylij Pij ^ 0(((m+n)g— g 2 )\og 2 (m+n)), comparing to E[s] > 0(2 max{m, ra}^log 2 (?n+n)) 
in [5], 3) our sample size to solve dTJ) can give an improvement on that of [5] in terms of degrees 
of freedom (note that (m + n)g — g 2 < (m + n)g < 2 max{m, n}g). 

Also, using the relaxed leverage scores we can observe improvement on sample size even in 
case of uniform sampling for matrices with incoherence restrictions. Let M G M nxn be the rank-£ 
reconstructing matrix with SVD USV T . [2J El 13, [9] use two incoherence parameters, and 

1 Leverage scores were introduced by EMIu^h! and ||V T ej||2 are called row and column leverage scores, 

respectively, by mm 
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Algorithm 1 Column-Space-Incoherent MC 
1: Input: M E M mxn , with max* pi < po, \/i E [m], s.t. 1 < < m/g. 

2: Observe all the entries of a row of M picked with probability p = min{(c 2 ^o£? log m)/m , 1} , 
where C 2 is a constant. 

3: Compute the leverage scores, {Dj} Vj E [n], of the space spanned by these rows, and use them 
as estimates for true {z/j}, Vj E [n] of M. 

4: Construct a sample set 0 of entries (i.j) of M observed with probabilities 

Pij = min jciLjjlog 2 (m + n), 1 j , Vi, j, (4) 


where L t j = ttQM. + 

L J m 1 


VjQ 


MO Q 
m 


VjQ 

n ' 


5: Solve ([T]) using sample set O, and let X* be the unique optimal solution. 

6: Output: X*. 


pi, for exact matrix completion using uniform sampling, where, (a) max l) j{ji l . Uj} < po, and 
(b) IjUV 7 j|oo = fiiy/g/n 2 . A meaningful range of po is 1 < po < min {m,n}/g. [T3] showed 
that if the sampling probability is uniform, such that, p VJ = p > c u max{po, p\}g log 2 n/n,Vi,j, 
where c u is a constant, then M is the unique optimal solution of (JT]) with high probability. 
The lower bound achieved on the sample size in [13] (sample-with-replacement model) was 
O(max{/To, Pi}ng log 2 n). Above, < po^/~g, and it could create a suboptimal dependence of 
sample size on g, in the worst case. Theorem 2 of (5j implies that observing entries with uniform 
probability satisfying, p> cq (2pog/n) log 2 n, Vi, j, for some constant co, would recover the matrix 
exactly, with high probability. In this case, the lower bound on sample size is O(2p,ong log 2 n). 
[5] eliminated the need for the parameter pi, and consequently the suboptimal dependence on g. 
It follows from Theorem [T] that we can recover the matrix exactly, with high probability, if each 
entry is sampled uniformly with probability 

p = max {min {ciLolog 2 (m + n), 1} , (mn)” 5 } , Vi, j, 

2 2 

where Lq = ci is a constant. This can improve 0(/XQ^ 2 log 2 (n)) on the sample 

size of [5]. 

2.1 Column-Space-Incoherent Matrix Completion 

Here we discuss exact completion of a low-rank matrix whose column space is incoherent, and 
we have control over the sampling of matrix entries. This setting is interesting in application 
domains like recommendation systems and gene expression data analysis (HDD- 

Algorithm |TJ adopted from [5], performs exact completion of a matrix M with incoherent 
column space, without a priori knowledge of leverage scores of M. Step 3 of Algorithm |T| computes 
the column leverage scores of M exactly, from only a small number of (uniformly) observed rows. 
We construct an additional sample set H of observed entries using our relaxed leverage scores in 
Step 4. Step 5 solves the nuclear norm minimization problem in ([!]) with H to recover M exactly. 
Theorem [2] proves the correctness of Algorithm [T] 

Theorem 2 Algorithm^ computes the column leverage scores of M exactly (step 3), i.e., hj = 
Uj. V) E [n]. Using the sample set Algorithmic recovers M as the unique optimal solution of 
W- The total number of samples required by Algorithmic is + 2 n)g — g 2 )\og 2 {m + n)). 

The results hold with probability at least, 1 — 66 log(m + n)(m + n) 3 ~ c , for sufficiently large c > 3. 
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We compare the bound on sample size in Theorem [2] with a couple of existing results. Let 
us assume m = n for simplicity. Theorem [2] can achieve an additive improvement 0(g 2 log 2 n) 
on the sample size of [5] while recovering M exactly, via ©• m proposed an adaptive sam¬ 
pling algorithm that recovers M exactly, with probability at least 1 — O(gS), and a sample size 
@(n 0 ng 3 / 2 log(g/5)). Assuming comparable failure probabilities, sample size in Theorem [ 2 ] is 
better when q is not too small. 

2.2 Coherent Matrix Completion using Two-Phase-Sampling 

In reality, we do not have knowledge about the leverage scores of M, i.e., {pt} and {ia,}, even 
when we have control over how to choose entries. [5] proposed a heuristic two-phase sampling 
procedure (Algorithm 1 of [5]) f° r exact matrix completion with no a priori knowledge about the 
leverage scores. Here is an informal description of it. 

Let, the total budget of samples be s, and (3 € [0,1] be a parameter. First, construct an initial 
set Hi by sampling entries uniformly (without replacement), such that, |fii| = (3s. Let M be the 
matrix with M ?J - = if (i,j) € Hi, and = 0 if (i, j) ^ Hi. Let the rank-£> SVD of M 
be USV 1 . Compute the leverage scores of M and use them as estimates for the leverage scores 
of M, i.e., use /L = y||U 7 ej|| 2 as ^ for i € [m], and Dj = ^H’WejH 2 as Vj for j € [n\. In the 
second phase, use these estimates to sample (without replacement) remaining (1 — (3)s entries of 
M with probabilities proportional to ( (lig/m + Vjg/n ) log 2 (m + n), to form the sample set H 2 . 
Then perform matrix completion using sample set H = Hi U H? in (fTj) . 

This heuristic is shown to work well on synthetic data that are less coherent ([5]). For highly 
coherent data, e.g., only few entries are non-zeros and others are zeros, it works poorly, as ex¬ 
pected. We can incorporate our notion of relaxed leverage scores into the second phase of the 
above procedure by observing (without replacement) the remaining (1 — (3)s entries of M with 
probabilities pjj oc (^ + ^ • pp)log 2 (m + n) to form sample set H 2 , and perform nuclear 

norm minimization in © using H = Hi U H 2 . We expect our relaxed leverage score sampling to 
follow similar trend as above, although we do not evaluate this heuristic numerically. 

Section [3] shows experimental results on real datasets to support the theoretical gain on the 
sample size using relaxed leverage score sampling. We give proof sketch of Theorem |T] and 
Theorem [2] in Section [4] and Section [5] respectively, closely following the proof outline of [5] . We 
highlighted the main technical differences between our result and [5] in Section |4j 


3 Experiments 


We show experimental performance of the exact recovery of real data matrices via nuclear norm 
minimization in © using our relaxed leverage score sampling. We use the software ‘TFOCS’ 
vl.2, written by Stephen Becker, Emmanuel Candes, and Michael Grant, to solve ©. 

Let M be the rank-£ data matrix. We form the sample set H re ; aa . by observing (i,j)-th entry 
of M according to the relaxed leverage score probabilities in (|5j): 


P [ [j elax] = min {c r ■ Lij , 1} , Vi, j (5) 

where Ljj = — + — — — • — and c r is a universal constant. Similarly, we form the sample set 
Qi ev by observing Mjj according to the leverage score probabilities in ©: 


[lev ] 

Pa = mm 


. / (IHQ , V'j8\ ,1 

m cr-1-J , 1 \ , 

l V m n / J 


Vi, j 


( 6 ) 
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where q is a universal constant. We use Qrelax and £li ev in the optimization problem (pQ), sepa¬ 
rately, to recover M. Let X* be the optimal solution to dU) using a sample set Q. We say X* 
recovers M exactly if ||M — X*||^ < e, where e is a tiny fraction. We set e = 0.001. We perform 
10 independent trials (sampling and recovery) and declare success if M is recovered exactly at 
least 9 times. Let s r and si be the average sample size for successful recovery of M using f \ e i ax 
and fliev, respectively. We expect c r ~ q, and our gain in sample size (s/ — s r ) to be strictly 
positive, as suggested by the theory. Further, we investigate how (si — s r ) behaves with respect 
to the rank g. For this, we define 

Normalized Gain (A s ) = sj(si — s r )/c r . (7) 

We expect A s to be close to g as the theory suggests ( si~s r ) oc 0(g 2 ). For fairness of comparison, 
we use the same random seed for both the sampling methods in ([5|) and ©. 

3.1 Datasets 

We use the following two datasets. 

MovieLens: This collaborative filtering dataset contains 100,000 ratings in the range 1 

and 5 by 943 users on 1682 movies. Each user has rated at least 20 movies. This dataset is 
numerically not low-rank. We perform rank truncation to create an explicit low-rank matrix to 
apply the theory in (jT|) . We observe the singular value spectrum of this data to heuristically 
choose two values for rank: g = 10 and g = 20. 

TechTC: We use a dataset from the Technion Repository of Text Categorization Database 
(TechTC) ([§]). Here each row is a document describing a topic, and words (columns) are the 
features for the topics. The (i, j)-th entry of this matrix is the frequency of j'-th word appearing 
in z-th document. We choose a dataset containing the topics with IDs 11346 ans 22294. We 
preprocessed the data by removing all words of length four or less. Then, each row is normalized 
to have unit norm. Also, we observe the singular value spectrum of this preprocessed 125 x 14392 
data to heuristically choose two values for rank: g = 10 and g = 20, to make the data explicitly 
low-rank. 

3.2 Results 

Figures [U and [2] plot the singular values and the normalized leverage scores for rank-10 approx¬ 
imation for MovieLens and TechTC data, respectively. Normalized leverage scores are close to 
1 when they are incoherent in nature. MovieLens is reasonably coherent, and TechTC has ex¬ 
tremely high coherence. Table [2] shows the constants ci and c r , and the normalized gain A s for 



Figure 1: [MovieLens] Singular values and the leverage scores for g = 10. 


exact recovery of MovieLens data. We see q = c r and A s ~ g, as expected. We observe similar 
results for TechTC data in Table [3j Overall, these results support the accuracy of the theoretical 
analysis on the gain in sample size using the relaxed leverage score sampling for exact recovery 
of a low-rank matrix via (HD- 
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Figure 2: [TechTC] Singular values, and the leverage scores for g = 10. 



Cl/ c r 

A, 

Q= 10 

11/11 

9.7 

II 

to 

o 

7/7 

18.4 


Table 2: [MovieLens] Gain in sample size for 
exact recovery using relaxed leverage score 
sampling. 



Cl/ Cr 

A s 

g = 10 

4/4 

6.6 

II 

to 

o 

3/3 

15.2 


Table 3: [TechTC] Gain in sample size for 
exact recovery using relaxed leverage score 
sampling. 


4 Proof of Theorem [T| 

The main proof strategy was outlined by laElE]: it is sufficient to construct a dual certificate Y 
obeying specific sub-gradient inequalities in order to show that M is the unique optimal solution 
to m (see Section [6] for more detail). We give a proof of Theorem [T] closely following the proof 
strategy of EM. Before stating the optimality conditions we need additional notations. 

Recall, U and V are the left and right singular matrices of M, respectively. Let (respec¬ 
tively Vfc) denote the k- th column of U (respectively V). Let T be a linear space spanned by 
elements of the form u^y T and xv^, 1 < k < g, for arbitrary x, y, and T 1 - be its orthogonal 
complement, i.e., T 1 - is spanned by the family (xy r ), where x (respectively y) is any vector 
orthogonal to the space spanned by the left singular vectors (respectively right singular vectors). 
Then, orthogonal projection onto T is given by the linear operator Vt ■ W nxn — » W mxn , defined 
as 

R t (X) = UU r X + XVV T - UU r XVV T . 

Similarly, orthogonal projection onto T 1 - is 

7V(X) = X - 7MX) = U ± U^XV ± V^. 

Note that any m x n matrix X can be expressed as a sum of rank-one matrices as follows: 

m,n 

x =j2(e i ej,X)e i ef. ( 8 ) 

i,j =1 

We define the sampling operator TZq : M mxn —). R mxn as, 

m,n 

^n(X) = ( eie J> X ) e * e J (9) 

i,j=i Pij 

where, 5ij = I ((i,j) E J)), I(-) being the indicator function. That is, TZq extracts the terms, 
corresponding to the indices (i,j) € fi, from |S]) to form a partial sum in (0. Let 'Pq(X) be the 
matrix with (^(X))^- = X,;j if (i,j) € Q, and zero otherwise. 
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4.1 Optimality Conditions 

Following the proof road map of H3U5], we restate the sufficient conditions for M to be the unique 
optimal solution to P) (Section [6] contains a proof of sufficiency). 

Proposition 1 The rank-g matrix M € M mxn with SVD M = USV 1 is the unique optimal 
solution to m if the following conditions hold: 

1. W'Pt'R-q'Pt — 'PrWop < 1/2- 

2. There exists a dual certificate Y which satisfies Vn(Y) = Y, and 

(a) \\V t (Y) - UV t || f < y/g(m + n)- 15 , 

(b) ||P t x(Y)|| 2 <1/2. 

Condition 1 of Proposition P suggests TIq should be nearly the identity operator on the subspace 
T. Next we discuss the construction of a dual certificate Y. 


4.1.1 Constructing the Dual Certificate 

We follow the so-called golfing scheme PPP to construct a matrix Y (the dual certificate) that 
satisfies Condition 2 in Proposition P Recall, we assume that the set of observed elements f l 
follows the Bernoulli model with parameter pij, i.e., each index (i,j) is observed independently 
with P[(i,j ) Gfi] = pij ( p^ in eqn Q). We denote this by 17 ~ Bernoulli{pij). Further, we 
assume that 17 is generated from 17 = where for each k, 17^ ~ Bernoulli(qij), and we 

set qtj = 1 — (1 — Pij) 1 / k °■ Clearly, this implies P[(i,j) € 17] = pij which is the original Bernoulli 
model for 17. Note that, qij > Pij/ko because of overlapping of 17j.’s. We set feo = 11 • log(m + n). 
Then, 


Qij > min |cq • log(m + n) 


/ l^iQ _j_ Q /^iQ Q 

V m n 


■*} 


m n 

where cq = ci/11. Starting with Wq = 0 and for each k = 1,fco, we recursively define 


( 10 ) 


W fc = W fc _! + Un k V t(XJV t - P T (W fc _i)) 


( 11 ) 


where the sampling operator Hn k ■ M mXTl —>• M mxn j s dehned as 

Kn k (X) = —!((*, j) € 17 fc ) (e^ej, X) e*ej. 

id q>i 

We set Y = Wfc 0 . This Y is supported on 17, i.e., Vn(Y) = Y. 

Let the sample set 17 be such that 

17 € {!7fc : 17 = U^ 1 17fc,17fc ~ Bernoulli(qij)}. (12) 

Since 17*. ~ Bernoulli{qij) implies 17 ~ Bernoulli(pij), for each k = 1,..., ko, we prove (in Lemma 
P Condition 1 of Proposition P using sample set 17 in (1121) . 

Lemma 1 Let Cl be a sample set in Then, for any universal constant c > 1, we have 

( 13 ) 

holding with probability at least 

1 — (m + n) 1_c . 


9 



Before we validate Condition 2 in Proposition [T] using the Y constructed above, we claim the 
following results to hold with high probability. First, we borrow the following definitions of 
weighted infinity norms for a matrix Z € M mxn from [5]. 


l Z ll M (oo,2) : = HZi,*ll 2 ,max^-- ||Z*j|| 2 


n 

VjQ 


l Z IU(oo) : = max|Z l:; -| 


m 


n 


[iiQ y VjQ 


where Z^* and Z*j denote the i-th row and j-th column of Z, respectively. 


Lemma [2] bounds the spectral norm of the matrix (7 Lq — X)(Z) using the sample set fi. 

Lemma 2 Let Z £ W nxn be a fixed matrix. Let M be a sample set in m ■ Then, for any 
universal constant c > 1, we have 

II (^n -X ) Z 11 2 ^ H Z IU(^,2) + - H Z IU(oo) 

holding with probability at least 

1 — (m + n) 1 ~ c . 

Next two results control the /i(oo, 2) and /x(oo) norms of the projection of a matrix after random 
sampling. 


Lemma 3 Let Z £ R mxn be a fixed matrix. Let VL be a sample set in M2 1) . Then, for any 
universal constant c > 2, we have 

II i'PT'Rfl ~ V t) Z ||^(oo,2) - 2 ( ll Z IU(oo,2) + IIZIUoo)) 

holding with probability at least 

1 — (m + n) 2 ~ c . 

Lemma 4 Let Z £ M mxn be a fixed matrix. Let M be a sample set in m ■ Then, for any 
universal constant c > 3, we have 

WIVtU^-Vt^W^ < ^ l|Z|| Moo) 

holding with probability at least 

1 — (m + n) 3 “ c . 

We now validate Condition 2 in Proposition [T| using the Y constructed above. 

Bounding ||UV T - V T (Y)\\ F 

We set Afc = UV T — T'V(Wfc), for k = 1,..., ko. Then, from definition of we have 

A*, = (Vt — T > T'R-n k 'PT) Afc_i. 

We used Vt( UV t ) = UV T and VtVt(X-) = TV(X). Using the independence of Afc_i and Qk, 

||A fc || F = \\{Vt - VT'R-n k 'PT)&-k-i\\ F < \[Pt — 'PT'R-Q. k T > T\\ op \\^k-\\\F- 
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We can bound this by recursively applying Lemma Q] with fl*,, for all k. Thus, 

ko 


\V T ( Y)-UV j L= ||A 


ko IIF 


I uvT IIf ^ 


(m + n) 15 


The above result fails with probability at most (m + n) 1 c for each k ; thus, total probability of 
failure is at most ll(?n + n) 1_c log(m + n ). 

Bounding ||'P r ±(Y)|| 2 

By definition, Y can be written as 

ko ko 

Y = E n v k V T (UV T - Vt{ W fc _!)) = J2 Kn k V T ( A fc _i) 

k =1 /c=l 

It follows that, 

l|7V( Y )ll 2 = ? , i/yv-t - 

We use 


fc=i 


ko 

< E IK K «* - fl( A t-i)ll2 

2 fc=l 


iP T (A fc ) = P t (UV t - V T (W k )) = UV T - iP T (W fc ) = A fc , for all fc. 

We apply Lemma [2] to each summand in the above inequality, with corresponding £l k , to obtain 


||P T r(Y)|| 2 < 2 ./-^ ||A 
V c o 


ko 


ko 


fc-llU(oo, 2 ) + — E H A 

fc=l 0 fc=l 


fc-ll 


/i(oo) 


(14) 


We can derive the following, applying Lemma [4] k times, with f\,, 


U(oo) = IK^r-^T^oJA fc _i|| M(oo) < ^ 


I A fc-*ll/i(oo) — ( 9 


UV J 


Im(oo) 


(15) 


holding with failure probability at most k ■ (m + n) 3 c , for all k. 

Similarly, applying Lemma [3] and Lemma 0] recursively, with we can derive, 

ll A ^IU°o,2) = I K'Pt - T , r^n fc T’r)A fc _ 1 || /i(0Oi2) < - ||A fc _ 1 || /i(oo) + - ||A fe _i|| M(o0i2) 

(step j) < EQ) H A fc-ilU(oo) + Q) ll A *-jll M (oo,2) 


< 


< 


3 / 1 \ * /1 x 


E 


i= 1 


uv ( ^ + 
I ll/i(oo) 


II Afe— 


j ll/i(oo, 2 ) 


2/ N UV 1(00) + 


S J 


(step k) < k[-) ||UV J 


2 j ii j ll^(oo, 2 ) 


l/i(oo) 


+ - uv J 


I ^t(oo,2) 


(16) 
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holding with failure probability at most k ■ {m + n) 2 c , for all k. Using (|15l) and (1161) . it follows 
from (ED, 




We note that, for 



fco / -, \ k —l 


UV J 


I oo,2) 


UV J 


I ^(00,2) 


(UV J )i,*|| 2 = ||e 2 uv J ||, = 


|(UV^| H efUVS|<^Vv^ 

||(UV T 


we 

m ’ 


*j II 2 - 11 II 2 _ 


Vj6 

n 


Thus, 


UV J 


I fi ( 00,2) 


m 


= max max J—- ||(UV )i,*|| 2 ,maxj— ||(UV )*,j|| 2 f = 1 




n 


j V we 


/*ij II2 


Therefore, 


||P T r(Y)|| 2 < 2J-Y, (fc-1) 


fco 


< 2 


fc=i 

00 


fc-i 


+ 


fc-P 


ko 

+ -E 


fc-i 


fc—1 _ 00 / ^ \ fc—1 


e*G)'+|e 

fc=i v 7 u fc=i 


c 2c 1 

-1- 

cq Co 2 


by setting co > 264c. 

Let, the total number of sampled entries be s. Expected number of observed entries required to 
solve (HD is E(s) = Yli jPij = 0(((m + n)g — g 2 )log 2 (m + n)). Summing up the individual failure 
probabilities of Proposition |T] the total failure probability never exceeds 33Tog(?n + n)(m + n) 3-c , 
for sufficiently large c > 3. 

Finally, we can apply Hoeffding’s inequality to show that s is sharply concentrated around 
its expectation, i.e., s = @(((m + n)g — g 2 )log 2 (m + n)) with probability at least 1 — 66 log(m + 
n){m + n) 3_c , for sufficiently large c > 3. 


This completes the proof of Theorem [l] 


5 Proof of Theorem [2] 

We closely follow the proof given by [5j. We pick each row of M with some probability p and 
observe all the entries of this sampled row. Let T C [m\ be the set of indices of the row picked, 
and <Sr(X) be a matrix obtained from X by zeroing out the rows outside T. Recall, SVD of M 
is USV T . We use the following lemma (Lemma 14 of [5J). 

Lemma 5 Let m < max* ^ ||U T e;|| 2 < /iq, Mi € [m], and p > C 2 ^dogm for some universal 
constant c 2 . Then, for any universal constant c > 1, and C 2 > 20c, 

||U T 5 r (U) — I e || 2 < 1/2, 

holds with probability at least 1 — (m + n ) 1_c , where is the identity matrix in M exe . 
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Now, ||U T 5r(U) — I e || 2 < 1/2 implies that U T 5r(U) is invertible and Sr(U) € R mX£l has 
rank- 0 . Using SVD of M, we can write 5r(M) = 5r(U)HV :r , and this has full rank- 0 . Therefore, 
5r(M) and M have the same row space, and we conclude that vj = Vj, Vj € [n]. Thus, using 
the sample set in Algorithm |T] we can recover M exactly via nuclear norm minimization in ([T]), 
with high probability. Expected number of entries observed in Algorithm |T] is 

m,n 

pmn + E Pij = O(fj,o((m + 2 n )0 — g 2 )log 2 (m + n)), 

where, as in (dJ. We apply standard Hoeffding inequality to bound the actual sample size, 
and Theorem [2] follows as a corollary of Theorem |T] 


6 Proof of Optimality Conditions in Proposition [1] 

Let M be the low-rank target matrix with rank -0 SVD M = USV 1 . We want to show that any 
perturbation Z to M, such that, M + Z is a solution to ©, strictly increases the nuclear norm, 
unless Z = 0. Now, M + Z is feasible only if Vq.(NI + Z) = 'Pn(M), which implies = 0, 

e.g., Z is in the null space of TZq operator. We can choose U_l and Vj_ such that [U, Uj_] and 
[V, Vj_] are unitary matrices for which (Uj_V±\ V T ± (Z)) = ||P T x(Z)|| Jt . Then it follows from 
standard inequality of trace norm, for some Y in the range of TZq, 


IM + ZII 


> 


(UV t + U ± V^,M + Z) 

rT + v ± vl,z) 

+ (UV T - Vt(Y),Vt(Z)) + (XJ±Vl - V T ±(Y),V T ±{Z)) 


= 

M 

* + (uv : 

= 

M 

* + <uw 

(a) 

> 

||M| 


uv r 

> 

M 

* 

* 1 

uv 

(b) 

> 

||M| 

* + 

( 

1 - 

> 

M 

* 



1- IPt-l(Y)|| 2 - 


|UV T -P T (Y)|| F (m aXji ,-r) 


1 — W'Pt'R'qPt — Vt\\ 


op 


l|7V(z)IL 


Above, (a) follows from Von-Neumann trace inequality, and (b) follows from Lemma El Using 
rnaxjj- -j== < (mn ) 5//2 < (m + n) 5 , and the conditions in Proposition [TJ we derive the final in¬ 
equality. Note that, Condition 1 in Proposition |T] implies IZq is the identity operator on the 
elements of subspace T, therefore V T ±(Z) = 0 implies Z = 0. 


The following lemma is similar to Lemma 13 of |5j. 

Lemma 6 For any Z € R mxn , s.t., T’n(Z) = 0, 

I|Pt(Z)|| f < (l— \\'PTKn'PT-'PT\\ op y i \\V T ±Z 


Proof: Let us define the operator TZl{ 2 : R mxn —>• R mxn as 


< /2 (z) 

u 


e*e 


T 

3 


13 









Note that TZq 2 is self-adjoint, and FJ 2 U}J 2 = TZq. Therefore, we have 


Also, we have 


n\l 2 V T { Z) " = (KnV T (,Z),V T {Z)} 

F 

= (V T nnV T (Z),VT(Z)) 

= (V T n n V T (Z) -Vt(Z),Vt(Z)) + (Vt(Z),Vt(Z)) 
> (i-\\V T KnVT-VT\\ op )-\\V T (Z)\\ 2 F 

1Z^~(Z) = 0 for any Z s.t. Vq(Z) = 0. It follows, 

F 


o = 


nK 2 (z] 


k\1 2 v t (z) 


> 


< 


k\1 2 V t {Z) 

n\l 2 v T ^(z) 


k\ 1 2 v t F z) 


< ( max 
F \ i,j 


l|7V(Z)|| F , 


where we use 


nH 2 v T ±(z) 


< max ■ 


1 


(e,ej, V T ± (Z)) eje 




F i,j y/Pij 

Combining (fT71) and (fT8l) . and using ||X||ji < ||X||*, 

1 


< max ■ 


( 17 ) 


(18) 


y/Pij 


ll^(Z)|| F 


^(1- \\V T nnVT-V T \\ op )- ||Pr(Z)|| F < (max 
The result follows. 


V y/Pij 


\\V t ±(Z)\\ f < (max-M ||P T x(Z)||, 


7 Proof of Technical Lemmas 

Here we prove Lemmas [T] through [5] using the matrix Bernstein inequality of Lemma [8] as the 
main tool. Also, we frequently use the fact in (1201) and the result in Lemma [71 Note that Vt is 
self-adjoint linear operator. Thus we can write the following for any X E M mxn : 

V T (X-) = '}2(VT(X-),e i eJ)e i ej = ^(V T (X),V T (e i eJ))e i ef = ^(X,V T (e i eJ))e i eJ (19) 


We can derive, for all i and j, 

12 


h3 




Pt W) II, = W) , e,ej) = ^ ~ ^ 

Also, we know for all i, j, 


0 < *£<./*£<!, 0<^< x m<i. 

m \l m n \l n 


( 20 ) 


( 21 ) 


Lemma 7 Using our notations, for all i, j, 


IMQ_ ^£? _ PiQ_ Vj6_ > PiQ 

m n m n ~ V m 


VjQ > MQ FjQ 
n ~ m n 
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Proof: Let, x = ^ and y = Then, 

(x + y - xy) 2 = xy + (x 2 - x 2 y) + (y 2 - xy 2 ) + x 2 y 2 + xy - x 2 y - xy 2 
= xy + x 2 (l - y) + y 2 (l - x) + xy(l - x)(l - y) 

> xy using (ED) 

Also, x + y — xy > 0. Thus, x + y — xy > ydcy > xy. The last inequality follows because 
0 < x,y < 1. 


Lemma 8 [Theorem 16] of [iff) 

Let Xi, € M mxn &e independent, zero-mean random matrices. Suppose 


max 


iV 


]TE[X f X 


t= 1 


AT 


^E[X^X f 


t=i 


< cU 


and ||X^11o < 7 almost surely for all t. Then for any c > 0, we have 


N 


S x < 

t= 1 


< 2y / c<r 2 log(m + n) + cylogtm + n) 


with probability at least 1 — (m + n) ( c ^. 

We consider sampling probabilities { <pj } of the form (1101) to prove Lemmas [D through |4j 

Notation Overloading: For simplicity, we reuse some of the notations in Section ITT! through 
17.41 Specifically, we replace U by U to denote a sample set in (fl2|) . and, 5ij = I((*, j) e P). 


7.1 Proof of Lemma [T] 

For any matrix Z € R mxn , we can write 

(V T nnV T - Vt) (Z) = (—Sij - l) (iP T (e*e J) , Z) (e*ej) = ^ ,%(Z). 

7. 1 i } / 7 7 


Using E [<5jj] = qij , we have E[5^(Z)] = 0 for any Z. Thus, we conclude that E[«Sy] = 0. Also, 
s are independent of each other. Using probabilities in (HOD (Sij ’s vanish when q, L j = 1, for all 
Z and and (1201) . we derive 


(Z) || F 


1 || / HT\ ||2 

< — \\Vt (e^e 

Qij 


■J J\\F H Z IIf ^ 


Co • log(m + n) 

From definition of operator norm, ||«Sjj|| op < eo . log ( 1 m+n ) ■ Also, we derive 


E [5?-(Z)] 


E 



( e i e J , Pt(Z)) (eiej,V T (eieJ)) V t ( eiej ) 


< e * e J,P t(Z)) (e i eJ,P T (e i eJ))P T 
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E E [ 5 «< Z )J 


hi 


< 


fmax 1 qiJ \\VT(e i eJ)\\ 2 F ) E (e*ej, Vt(Z)) Vt (e*ej) 

V Qij / i j 

( n * J X ~Qrj^~ I! ( e,: ) 11 ^) Pr ^< e ,; e J^r( z )> (e,:ej) 

("max 1 g8J ||^r(e i ej)||^ > ) ||Pr(Z)|| F 

V wj J 


E E 7 


h3 


< max-- 

Qij 


VT(eieJ )\\ 2 < --— x 

J cq ■ log(m + n) 


op 


We apply Matrix Bernstein inequality in Lemma 0 using 

1 1 


u 2 = 


7 = 


Co • log(m + n) 
to obtain, for any c > 1 , Co > 20 c, 

W'Pt'R-si'Pt ~ 'PrWop < 1/2 

holding with probability at least 

1 — (m + ?r)( 1-c ). 


cq • log(m + n) ’ 


7.2 Proof of Lemma [2] 

We can write the matrix (77 q — I) Z as sum of independent matrices: 


(77q - X) Z = r— Sij - 1 j Z t] e t eJ = E Sjj. 




Qij 


h3 


We note that, E[Sjj] = 0, and Sjj’s are zero matrix when q VJ = 1 , for all We have 


| || 2 < Moreover, 


E E [ S b S 5] ^Z^efE 


hJ 


ho 


—<% - 1 
Qij 


E E z 


2 1 Qij 
ij' 


Qij 


Thus, 


Similarly, 


E E t s ^ s 


h3 


< max 


1 Qij r/2 
^ij 


= 1 


E E [ S « S «] 




< max 


j i=i 


1 Qij ryl 
L ij 
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Clearly, when q %3 = 1 the above quantities are zero. Using q V] in (1101) . and Lemma 0 we have 

II Z ll/i(oo) 


1 


^ ,J ^ 2 Co • log(m + n) 


m 


n 


< 


HiQ\ VjQ c 0 • log(m + n)' 
Using qij in ((TDD, and noting that ^ ^ we have 


E 


L^y z? . < 


=1 r hh 




< 


l/z(oo,2) 


c 0 • log(m + n) ^HQ 13 c 0 • log(m + n) ’ 


Similarly, 


E 


1 


— Vz 

7/j /O ' 


2 < 


l/i(oo,2) 


\ Qij ^7 2 ^ 

u — cq • log(m + n) VjQ 13 ~ cq • log(m + n) 


The lemma follows from Matrix Bernstein inequality in Lemma |H1 with 

7 log(m + ra) < — ||Z|| (oo) , u 2 log (m + n) < ||Z|| 2 ( 2) . 

C 0 ^ ' C 0 


7.3 Proof of Lemma [3] 

Let, 


X = (V T K n - Vt )Z = E ( f-. ~ 1 ) z u^r(e,;eJ) 




Weighted 6 -th column of X can be written as sum of independent, zero-mean column vectors. 


HQ 


h3 


Qij 


—X*, fe = E — - 1 z b {Vr(e t eJ)e b ) — = ]T 








sjsii 


to apply Matrix Bernstein 


Clearly, E[sjj] = 0. We need bounds on ||sjj|| 2 and -E 
inequality. First, we need to bound 117^^e)e;,11 2 - 

||7Me*eJ)e fc || 2 = ||UU T (e 4 eJ)e b + (e,eJ)VV T e 6 - UU T (e i eJ)VV T e b || 2 

|UU T ej + (I - UU T ) e i ||V T e fe || 2 || 2 < Jw + ^ j = 


(I-UU T ) e i ejYV T e fe || 2 < 


ej VV J e b 


j^b, 


( 22 ) 


Above we use triangle inequality and definition of /i, and v b . Note that, is a zero vector when 
qij = 1, for all Otherwise, for q, L j / 1, we consider two cases. Using bounds in (f22l) . we 

have for j = b, 


1 

Qib 


I^27|| 2 E |Z^f,| A / 


n 


fMQ_ + HQ_ 
m n 


Using qjj in JT0]), q ib > c 0 log(m + n )\j y IT ancl && — c o l°g( m + n ) ‘ 7 ) 7 - Combining these 
two inequalities, we have 


II2 


log(m + n) < — |Z ib | 
co 


m 

HiQ 


n 


nMQ 1 j 

m ' n I 


HQ 


+ 


~ c 0 l|Z tM 
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ej VV J e b 


For j / b, using q lb > cq log(m + n) \J if \J if (Lemma [3) and '' Ta7 ”' 7 ' t 

Ik..11 /IE. PH. f 1 ™ < 2 

13 2 qij 13 \j u b Q \ n \ n ~ c 0 log(m + n) 

Therefore, for all (z,j), we have ||s^|| 2 < CQ log 2 m +n) I|Z|| m(o o) • 

On the other hand, 


< /MR . iM 
— V n n 5 


z m 


oo) 


EE 


S !j S *j] 




E + E I 1 ——Z 2 - ||Pr(eieJ)eft||2 • — 


The above quantity is zero for q.ij = 1 . Otherwise, for qij / 1, we consider two cases. 
For j = b, using ([ 22 ]) we have, V T (eieJ)e b ^ + \Jpf) < 2 (^r + if) ■ 

Using q^ in (fTOl) . we have, 

-z;,, ( — + — )■ — < —w- 4 

Qib 




j=b,i 


V m n J v b g cq log(m + n) M°°> 2 ) ’ 


where we use the following bound in the second inequality. For all (i,j), qij ^ 0, 


MR _|_ MR 


MR _i_ MR — MR . MR 

m ' n m n 


= i + 


MR . MR 

m n 


iM. _i_ Hi! — Hi! . Hi! — 

m ' n m n 


< i + 


MR . MR 

m n <C 2 


max{ 


Me Me\ — 


For j / 6 , using q l3 > Co log(m + n) • and 

< 


E 

j¥=b,i 


< 


< 


< 


E 

j¥=b,i 

n 

VbQ 

n 


i^Z 2 , |eJVV T e fc | 2 • — 

qij 33 v b e 

Eldw^fE^z ?. 




Qij 


|e T VV T eJ 2 

v bQ~^ b ' 3 \°o log(m + n) zzj£ 


1 


n 


E* 


IzIImoo, 


2 ) 


c 0 log(m + n) J 


I /x(oo ,2) 


£Eb T wk,| 


j¥=b 


cq log(m + n) ’ 


where the last inequality follows from, ffj^ b 
Combining the two summations, 


ej VV J e b 


< ||VV T eJL < 


E 1 ^ 


SjjSjj] 


hJ 


< 


co log(m + n) /^l 00 ' 2 ) 


We can bound 


E 


SiJ s ij s 


V 


in a similar way. 


We apply Matrix Bernstein inequality in Lemma [ 8 ] with 

2 _ o 5 


7 = 


Co log(777. + Tl) ^°°) ’ 


u 2 = 


cq log(m + n) ^(oo> 2 ) i 
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to obtain 


£■ 


hJ 


< 


20 c 2 c 

II ll/i(oo,2) + ~ II Moo) 


We set cq > 80c to derive 


n 


X 


no 


Similarly, we can bound 




/ m y 

flae a ’* 




^ 9 ( llZIUoo.2) + II Z IU(oo) 


by the same quantity. We take a union bound over all 


rows a and all columns b (i.e., total (m + n) events) to obtain, for any c > 2 , 

U'Pt'R^i ~ V T ){ Z)|| m(00j2) < - ( ||Z|| M00)2) + ||Z|| Moo) ) 
holding with probability at least 1 — (m + n) 2_c . 

7.4 Proof of Lemma [4] 

Let, X = (Vt'R.q — Pt)Z = JT ^ — 1^ Z^ J)j- We write rescaled (a, 6 )-th element 

of X as 


l X K/^/£ - E ( £ - 1 ) Z « (rr(e>e!)h ,/-fb/£ - £ 


n 


HaQ V VbQ 


1,3 


Qij 


m 


n 


HaQ V v bQ 




This is a sum of independent, zero-mean random variables, we seek to bound | | and 

First, we need to bound ^e a e^,'Pr(ejeJ)^ 

|<e„eJ',Pr(e i eJ)>| 




*7 


= 


= |e^UU T (e ie J)e b + e^(e ie J)VV T e 6 - e^UU T (e i eJ)VV T e b | 
'P T (e a e i 


T\ll 2 = _i_ iM _ . iM 

F m ' n m n 


el(l - UU T )e a eJVV T e b 

< 

eJVV T e, 

e a UU r eje^ (I — VV T )eft| 

< 

ej UU T ej| 


elUXJ 1 e ie j VV J e 6 


< le^UU^e,; 


efVV T e 6 


i = a, j = b, 

i = a,j + b, 
i / a, j = b, 
i ~f~ a,j + b 


(23) 


where we use ||I — UU T || 2 < 1 and ||I- VV t || 2 < 1. 
Note that, s tJ = 0 when q t] = 1 . Otherwise, for ^ 1 , 

1 , 


S ij I — 


Qij 




l< e “ e ^M))iyx 


n 


l^a Q V ^bQ 


We consider four cases. 

For i = a, j = b, using q ab > c 0 log(m + n) + 1M _ ^ . 1M) 


1 


I 


< ^|Za fe ||l^(e a e^)||^/-^- A /£ ; 


n 


< 


J ab\ 


m 


n 


l^aQ V v bQ 
||Z| 

< 


M°o) 


cq log(m + n) Y HaQ V HQ cq log(m + n) 
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For i = a,j / 6 , using q aj > c 0 log (to + n) ~ ^ ‘ ^ c o lo g( m + ™)ir> 

I Z II m(oo) 


l s b'l ^ 




efVV T e b 


m / n 


< 


J a?l 


n / m 


HaQ V ^ c o log(m + n)\ VjQ\ HaQ C 0 log(m + n) 


< 


Similarly, for i ^ a,j = b, using qn, > colog(?n + n)^ 

, | < II^ll/x(oo) 

y — cq log(m + n) 
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Above we use < /^ < 1, i/— < 1, for all i, j. We conclude, for all (i, j), 
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The above quantity is zero for = 1. We bound the above considering four cases for q^ / 1. 
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Above we use, 
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Combining the summations, we derive 
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We now apply Bernstein inequality in Lemma [5] to obtain, for any c > 3, Co > 68 c 
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We take union bound over all (a, b) (i.e., total mn < (m + n ) 2 events) to conclude that the above 
result holds with probability at least 
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7.5 Proof of Lemma [5] 
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We apply the matrix Bernstein inequality in Lemma [8] to derive the result. 
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